Globally, unrelated protein sequences appear random
نویسندگان
چکیده
MOTIVATION To test whether protein folding constraints and secondary structure sequence preferences significantly reduce the space of amino acid words in proteins, we compared the frequencies of four- and five-amino acid word clumps (independent words) in proteins to the frequencies predicted by four random sequence models. RESULTS While the human proteome has many overrepresented word clumps, these words come from large protein families with biased compositions (e.g. Zn-fingers). In contrast, in a non-redundant sample of Pfam-AB, only 1% of four-amino acid word clumps (4.7% of 5mer words) are 2-fold overrepresented compared with our simplest random model [MC(0)], and 0.1% (4mers) to 0.5% (5mers) are 2-fold overrepresented compared with a window-shuffled random model. Using a false discovery rate q-value analysis, the number of exceptional four- or five-letter words in real proteins is similar to the number found when comparing words from one random model to another. Consensus overrepresented words are not enriched in conserved regions of proteins, but four-letter words are enriched 1.18- to 1.56-fold in alpha-helical secondary structures (but not beta-strands). Five-residue consensus exceptional words are enriched for alpha-helix 1.43- to 1.61-fold. Protein word preferences in regular secondary structure do not appear to significantly restrict the use of sequence words in unrelated proteins, although the consensus exceptional words have a secondary structure bias for alpha-helix. Globally, words in protein sequences appear to be under very few constraints; for the most part, they appear to be random. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.
منابع مشابه
Amino Acid Substitution Scores
GNPKVKAH Here we discuss standard ways of assigning a score to each amino acid pair, i.e., to each possible column of a gap-free pairwise protein alignment. Examples of such scoring matrices include the PAM30, PAM70, BLOSUM80, BLOSUM62 and BLOSUM45 matrices that are available on NCBI’s blastp server. Such scores are appropriate for comparing two sequences about which we have no other informatio...
متن کاملAmino Acid Substitution Scores
GNPKVKAH Here we discuss standard ways of assigning a score to each amino acid pair, i.e., to each possible column of a gap-free pairwise protein alignment. Examples of such scoring matrices include the PAM30, PAM70, BLOSUM80, BLOSUM62 and BLOSUM45 matrices that are available on NCBI’s blastp server. Such scores are appropriate for comparing two sequences about which we have no other informatio...
متن کاملAssembly of protein tertiary structures from fragments with similar local sequences using simulated annealing and Bayesian scoring functions.
We explore the ability of a simple simulated annealing procedure to assemble native-like structures from fragments of unrelated protein structures with similar local sequences using Bayesian scoring functions. Environment and residue pair specific contributions to the scoring functions appear as the first two terms in a series expansion for the residue probability distributions in the protein d...
متن کاملOptimal Stopping Policy for Multivariate Sequences a Generalized Best Choice Problem
In the classical versions of “Best Choice Problem”, the sequence of offers is a random sample from a single known distribution. We present an extension of this problem in which the sequential offers are random variables but from multiple independent distributions. Each distribution function represents a class of investment or offers. Offers appear without any specified order. The objective is...
متن کاملInsights into the amyloid folding problem from solid-state NMR.
Amyloid fibrils are filamentous aggregates, with typical diameters of 10 nm and lengths on the order of microns, formed by a large class of peptides and proteins with disparate sequences and with molecular masses ranging from less than 1 kDa to tens of kilodaltons. Figure 1 shows typical amyloid fibrils as they appear in electron microscopy (EM)1 measurements. Current interest in amyloid fibril...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Bioinformatics
دوره 26 3 شماره
صفحات -
تاریخ انتشار 2010